3,733 research outputs found

    TDL--- A Type Description Language for Constraint-Based Grammars

    Full text link
    This paper presents \tdl, a typed feature-based representation language and inference system. Type definitions in \tdl\ consist of type and feature constraints over the boolean connectives. \tdl\ supports open- and closed-world reasoning over types and allows for partitions and incompatible types. Working with partially as well as with fully expanded types is possible. Efficient reasoning in \tdl\ is accomplished through specialized modules.Comment: Will Appear in Proc. COLING-9

    The DeepThought Core Architecture Framework

    Get PDF
    The research performed in the DeepThought project aims at demonstrating the potential of deep linguistic processing if combined with shallow methods for robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. On the basis of this approach, the feasibility of three ambitious applications will be demonstrated, namely: precise information extraction for business intelligence; email response management for customer relationship management; creativity support for document production and collective brainstorming. Common to these applications, and the basis for their development is the XML-based, RMRS-enabled core architecture framework that will be described in detail in this paper. The framework is not limited to the applications envisaged in the DeepThought project, but can also be employed e.g. to generate and make use of XML standoff annotation of documents and linguistic corpora, and in general for a wide range of NLP-based applications and research purposes

    Hybrid robust deep and shallow semantic processing for creativity support in document production

    Get PDF
    The research performed in the DeepThought project (http://www.project-deepthought.net) aims at demonstrating the potential of deep linguistic processing if added to existing shallow methods that ensure robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. We use this approach to demonstrate the feasibility of three ambitious applications, one of which is a tool for creativity support in document production and collective brainstorming. This application is described in detail in this paper. Common to all three applications, and the basis for their development is a platform for integrated linguistic processing. This platform is based on a generic software architecture that combines multiple NLP components and on robust minimal recursive semantics (RMRS) as a uniform representation language

    TDL : a type description language for HPSG. - Part 1: Overview

    Get PDF
    Unification-based grammar formalisms have become the predominant paradigm in natural language processing NLP and computational linguistics CL. Their success stems from the fact that they can be seen as high-level declarative programming languages for linguists, which allow them to express linguistic knowledge in a monotonic fashion. More over, such formalisms can be given a precise set theoretical semantics. This paper presents mathcal{TDL}, a typed featurebased language and inference system, which is specically designed to support highly lexicalized grammar theories like HPSG, FUG, or CUG. mathcal{TDL} allows the user to define possibly recursive hierarchically ordered types consisting of type constraints and feature constraints over the boolean connectives wedge, vee, and neg. mathcal{TDL} distinguishes between avm types (open-world reasoning), sort types (closed-world reasoning), built-in types and atoms, and allows the declaration of partitions and incompatible types. Working with partially as well as with fully expanded types is possible, both at definition time and at run time. mathcal{TDL} is incremental, i.e., it allows the redefinition of types and the use of undefined types. Efficient reasoning is accomplished through four specialized reasoners

    Parameterized type expansion in the feature structure formalism TDL

    Get PDF
    Over the last few years, unification-based grammar formalisms have become the predominant paradigm in natural language processing systems because of their monotonicity, declarativeness, and reversibility. From the viewpoint of computer science, typed feature structures can be seen as data structures that allow representation of linguistic knowledge in a uniform fashion. Type expansion is an operation that makes the constraints on a typed feature structure explicit and determines their satisfiability. We describe an efficient expansion algorithm that takes care of recursive type definitions and allows exploration of different expansion strategies through the use of control knowledge. This knowledge is specified in a separate layer, independently of grammatical information. Memoization of the type expansion function drastically reduces the number of unifications. In the second part, nonmonotonic extensions to TDL and the implementation of well-typedness checks are presented. Both are closely related to the type expansion algorithm. The algorithms have been implemented in Common Lisp and are integrated parts of TDL and a large natural language dialog system

    Integrating deep and shallow natural language processing components : representations and hybrid architectures

    Get PDF
    We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processing tasks. We introduce XML standoff markup as an additional abstraction layer that eases integration of NLP components, and propose the use of XSLT as a standardized and efficient transformation language for online NLP integration. In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. WHITEBOARD is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes WHITEBOARD into various dimensions such as configurability, multilinguality and flexible processing strategies. We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In WHITEBOARD, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semanticsoriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.Diese Arbeit beschreibt Grundlagen und Software-Architekturen für die Integration von flachen mit tiefen (linguistikbasierten und semantikorientierten) Verarbeitungskomponenten für natürliche Sprache. Das Hauptziel dieses neuartigen, hybriden Integrationparadigmas ist die Verbesserung der Robustheit der tiefen Verarbeitung. Nach einer Einführung in constraintbasierte Analyse natürlicher Sprache geben wir einen Überblick über typische Aufgaben flacher Sprachverarbeitungskomponenten. Wir führen XML Standoff-Markup als zusätzliche Abstraktionsebene ein, mit deren Hilfe sich Sprachverarbeitungskomponenten einfacher integrieren lassen. Ferner schlagen wir XSLT als standardisierte und effiziente Transformationssprache für die Online-Integration vor. Im Hauptteil der Arbeit stellen wir unsere Beiträge zu drei hybriden Architekturen vor, welche auf den beschriebenen Grundlagen aufbauen. SProUT ist ein flaches System, das Elemente tiefer Verarbeitung wie Typhierarchie und getypte Merkmalsstrukturen nutzt. WHITEBOARD ist das erste System, welches nicht nur Part-of-speech-Tagging, sondern auch Eigennamenerkennung und flaches topologisches Parsing mit tiefer Verarbeitung kombiniert. Schließlich wird Heart of Gold vorgestellt, eine Middleware-Architektur, welche WHITEBOARD hinsichtlich verschiedener Dimensionen wie Konfigurierbarkeit, Mehrsprachigkeit und Unterstützung flexibler Verarbeitungsstrategien generalisiert. Wir beschreiben verschiedene, mit Hilfe der hybriden Architekturen implementierte Anwendungen wie strukturierte Eigennamenerkennung, Informationsextraktion, Kreativitätsunterstützung bei der Dokumenterstellung, tiefe Frageanalyse, sowie Evaluationen. So konnte z.B. in WHITEBOARD gezeigt werden, dass durch flache Vorverarbeitung sowohl Abdeckung als auch Effizienz des tiefen Parsers mehr als verdoppelt werden. Heart of Gold bildet nicht nur Grundlage für semantikorientierte Sprachanwendungen, sondern stellt auch eine wissenschaftliche Experimentierplattform für weitere, neuartige Kombinationsstrategien dar, welche zudem die Replizierbarkeit und Vergleichbarkeit von Ergebnissen erleichtert

    Semi-automating the reading programme for a historical dictionary project

    Get PDF
    This paper describes the resources and software procedures used or developed in a major enabling step towards the revision of the scholarly reference work A  Dictionary of South African English on Historical Principles (DSAE, Silva et al. 1996), namely the semi-automatic generation of a digitally-sourced lexical database on which new and updated dictionary entries will be based; as well as the addition, in parallel, of a new corpus of South African English (SAE) to the project. Drawing on online data sources and an extensive list of known SAE word forms, we have developed a software toolchain to gather, encode, annotate and collate textual sources, producing: (i) a 3.1-billion part-of-speech-annotated corpus of South African English; (ii) a lexical database of illustrative quotations for over 20,000 known SAE word forms, available for selection at the entry-revision stage; and (iii) a list of potential new variant spellings and headword inclusion candidates. These steps replace, where recent electronic sources are concerned, the mechanical aspects of quotation gathering, normally undertaken manually through a reading programme requiring years of teamwork to acquire sufficient coverage (cf. Hicks 2010).Keywords: corpora, dictionary workflows, historical lexicography, language varieties, lexical databases, reading programmes, South African Englis

    An integrated architecture for shallow and deep processing

    Get PDF
    We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis

    TDL ExtraLight user's guide

    Get PDF
    This paper serves as a user's guide to the first version of the type description language TDL used for the specification of linguistic knowledge in the DISCO project of the DFKI
    • …
    corecore